The Ckip Chinese Treebank: Guidelines for Annotation
نویسندگان
چکیده
This paper aims to present the methodology and guidelines for annotation in CKIP Chinese Treebank. Under the framework of the Information-based Case grammar (ICG), a lexical feature-based grammar formalism, which stipulates each lexical item containing both syntactic and semantic information, the potential phrasal heads of input are located and the semantic relations between words are also identified. Thus, not only phrasal categories but also thematic roles are both annotated. Incorporating with Head-Driven Principle, some guidelines are also implemented for more consistent annotation in such grammatical phenomenon as the constructions of coordinates, topicalization, and the construction with nominal predicate. In addition, we tag the CKIP Treebank with semantic categories to extract useful collocation among semantic classes of the bracketed constitutes, which is also supposed to further enhance the performance of our parsing model.
منابع مشابه
The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0)
This document describes the Part-of-Speech (POS) tagging guidelines for the Penn Chinese Treebank Project. The goal of the project is the creation of a 100-thousand-word corpus of Mandarin Chinese text with syntactic bracketing. The Chinese Treebank has been released via the Linguistic Data Consortium (LDC) and is available to the public. The POS tagging guidelines have been revised several tim...
متن کاملAutomatic Adaptation of Annotations
Manually annotated corpora are indispensable resources, yet for many annotation tasks, such as the creation of treebanks, there exist multiple corpora with different and incompatible annotation guidelines. This leads to an inefficient use of human expertise, but it could be remedied by integrating knowledge across corpora with different annotation guidelines. In this article we describe the pro...
متن کاملIterative Transformation of Annotation Guidelines for Constituency Parsing
This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training d...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملThe Bracketing Guidelines for the Penn Chinese Treebank (3.0)
This document describes the bracketing guidelines for the Penn Chinese Treebank Project. The goal of the project is the creation of a 100-thousand-word corpus of Mandarin Chinese text with syntactic bracketing. The Chinese Treebank has been released via the Linguistic Data Consortium (LDC) and is available to the public. This document can be divided into six parts. Section I discusses six funda...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999